Advanced Geospatial Data Processing for Social Scientists
Dennis Abel & Stefan Jünger
2025-04-28
Day
Time
Title
April 28
10:00-11:15
Introduction
April 28
11:15-11:30
Coffee Break
April 28
11:30-13:00
Raster data in R
April 28
13:00-14:00
Lunch Break
April 28
14:00-15:15
Raster data processing
April 28
15:15-15:30
Coffee Break
April 28
15:30-17:00
Graphical display of raster data in maps
April 29
10:00-11:15
Datacube processing I
April 29
11:15-11:30
Coffee Break
April 29
11:30-13:00
Datacube processing II & API access
April 29
13:00-14:00
Lunch Break
April 29
14:00-15:15
Data integration and linking (with survey data)
April 29
15:15-15:30
Coffee Break
April 29
15:30-17:00
Outlook and open session with own application
Our plan
Thus far, we have learned about
Different data formats
How to load them
First steps in interacting with them
In this session, you will learn
How to wrangle raster data even further
Linking to vector data
Manipulating the raster values
How to converse from one format into the other
Subsetting
Cropping raster data
Cropping is a method of cutting out a specific ‘slice’ of a raster layer based on an input dataset or geospatial extent, such as a bounding box. We often do this to ‘zoom’ in on a dataset or to make our computations more efficient. Let’s pretend we are mainly interested in the small-scale population size of Eastern Germany. For this purpose, we can use the bounding box of Eastern Germany.
Bounding box in R
The easiest way (in my opinion) to create a bounding box in R is to use the sf::st_bbox() function, possibly based on another geospatial dataset.
If we only want to add one attribute from a vector dataset Y to another vector dataset X, we can conduct a spatial join using sf::st_join() as shown earlier. In the raster data world, these operations are called raster extractions.
Raster data are helpful when we aim to
Apply calculations that are the same for all geometries in the dataset
Extract information from the raster fast and efficient
Pulling in some data
For this effort, we re-import the synthetic survey geocoordinates. We sample 100 geocoordinates from the whole dataset, as we only need a few to demonstrate the procedures that are followed.
There’s a multitude of arguments that we can adjust to conduct the extraction. An important option is from which raster cells we want our extraction done. Here’s an example of how the argument terra::extract(..., touches = FALSE/TRUE) works.
touches = FALSE vs. touches = TRUE
Let’s see how the values differ when we apply the option or not.
Often, we default to the mean of raster cell values to be extracted. In our example, calculating the sum is more relevant as we deal with population counts. Or the maximum?
We can use the same procedure to aggregate a raster dataset into a vector polygon dataset. That’s a widespread use case. Let’s load our German districts vector dataset in ./data/. This time, we will use WorldPop raster data on gender ratios from 2020.
Again, we use terra::extract() to create the aggregated data, which we can add to our vector dataset.
german_districts$gender_ratios_2020 <- terra::extract( gender_ratios_2020, german_districts, fun = mean, na.rm =TRUE, ID =FALSE ) |>unlist()plot(german_districts["gender_ratios_2020"])
Manipulating the raster data
Digging deeper
The previous steps are efforts that work on the raw raster cell values data. However, there are occasions where we might want to work on the raster data themselves or pre-process them to add them to another dataset (e.g., a vector file). Some of these procedures are for visualization, such as heat maps, and others are necessary for our later analyses. We will show a few in the following.
Creating a quick ‘heat map’
Population counts for the whole of Germany help us to identify urban and rural clusters. But there may be applications where we need to zoom in on a specific city to identify within-city variations regarding specific attributes like foreigner shares. Let’s do that for the German capital Berlin. For this purpose, we subset the district data and mask and crop data on foreigner shares from the German census.
Although we can identify some clusters using the raw data, some smoothing may be helpful. We can use the terra::focal() function to do that. It applies a moving window filter on all raster cells of a grid. We’ll have a more detailed look at this function in a minute.
foreigners_berlin_smooth <- terra::focal( foreigners_berlin, w =matrix(1, 5, 5), fun = mean, na.rm =TRUE )plot(foreigners_berlin_smooth)
‘Real’ point pattern analysis
Usually, when we talk about heat maps, we mean analyzing point patterns and whether they spatially cluster. terra might not be your best choice regarding more elaborated techniques to estimate density kernels and distance base bandwidths between points to draw clusters. For this purpose, packages such as spatstat are better suited but require learning about other data structures. That said, densities are more advanced ways of counting things in raster grid cells, and we can mimic this behavior also with terra. So, let’s stick to this package and crop all of our points to the extent of Berlin.
points_berlin <-readRDS("../../data/synthetic_survey_coordinates.rds" ) |> sf::st_crop(berlin)plot( points_berlin["foreigner"], col =c("black", "green"))
Creating a raster density template
Next, we simply want to count points with the attribute foreigner = 1 in raster grid cells for our density estimation. However, our points are sparse compared to our comprehensive raster dataset. We cannot initially rely on 1 km² grid cells as with the Census grid data. But 5 km² may be a good compromise. Let’s do that!
You may wonder why we are doing that. The answer is simple: We want to count the number of points in each grid cell. We use the function terra::rasterize() as a simple technique.
Now, this is not very pleasant. These data looks… not good. But fear not, working with raster data is powerful, as we now use a function you already know for projecting one CRS into another: terra::project(). This function can also be used to aggregate and disaggregate data based on the structure of another dataset. So, while we could not initially use 1 km² grid cells for our density ‘estimation’, we can reproject our 5 km² onto a 1 km² grid like our Census grid data. A bit of masking also helps get rid of cells that are not within the Berlin border.
We can also apply smoothing as with the population grid data.
terra::focal( foreigners_berlin, w =matrix(1, 5, 5), fun = mean, na.rm =TRUE) |>plot()
focal( points_berlin_density, w =matrix(1, 5, 5), fun = mean,na.rm =TRUE) |>plot()
Playing with the smoothing
terra::focal( foreigners_berlin, w =matrix(1, 3, 3), fun = mean, na.rm =TRUE) |>plot()
focal( points_berlin_density, w =matrix(1, 3, 3), fun = mean,na.rm =TRUE) |>plot()
Playing with the smoothing
terra::focal( foreigners_berlin, w =matrix(1, 9, 9), fun = mean, na.rm =TRUE) |>plot()
focal( points_berlin_density, w =matrix(1, 9, 9), fun = mean,na.rm =TRUE) |>plot()
What is this argument w?
It’s magic. Just kidding, but it is indeed really powerful. It builds around the idea of connecting the value of a focal grid cell to the values of surrounding grid cells, as in the figure below. Hence, the name of the function terra::focal() where the argument is used.
It’s just a simple base R matrix
When using this matrix as input and applying the statistic fun = mean, we change the value of the focal grid cell to the mean of the values of itself and the 8 surrounding grid cells. That’s what we did before in one example.
terra::focal( foreigners_berlin, w =matrix(1, 5, 5), fun = mean, na.rm =TRUE) |>plot()
focal( foreigners_berlin, w = weighted_w, fun = mean,na.rm =TRUE) |>plot()
Real life example: Edges of immigrant rates
In ethnic diversity research, whether sudden changes in the neighborhood composition may increase the potential for conflicts between groups is a relevant question. Researchers use edge detection algorithms from image processing to investigate such changes spatially.
What is edge detection?
Edge detection identifies sudden color changes in an image to ‘draw’ borders of things in a picture. Here’s an example of the prominent Sobel filter.